With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting. To adapt PTMs with model parameters frozen, most current approaches focus on the input side, seeking for powerful prompts to stimulate models for correct answers. However, we argue that input-side adaptation could be arduous due to the lack of gradient signals and they usually require thousands of API queries, resulting in high computation and time costs. In light of this, we present Decoder Tuning (DecT), which in contrast optimizes task-specific decoder networks on the output side. Specifically, DecT first extracts prompt-stimulated output scores for initial predictions. On top of that, we train an additional decoder network on the output representations to incorporate posterior data knowledge. By gradient-based optimization, DecT can be trained within several seconds and requires only one PTM query per sample. Empirically, we conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $10^3\times$ speed-up.
translated by 谷歌翻译
Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (``areas'') to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere's center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot learning tasks across NLP and CV and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.
translated by 谷歌翻译
A lack of driver's vigilance is the main cause of most vehicle crashes. Electroencephalography(EEG) has been reliable and efficient tool for drivers' drowsiness estimation. Even though previous studies have developed accurate and robust driver's vigilance detection algorithms, these methods are still facing challenges on following areas: (a) small sample size training, (b) anomaly signal detection, and (c) subject-independent classification. In this paper, we propose a generalized few-shot model, namely EEG-Fest, to improve aforementioned drawbacks. The EEG-Fest model can (a) classify the query sample's drowsiness with a few samples, (b) identify whether a query sample is anomaly signals or not, and (c) achieve subject independent classification. The proposed algorithm achieves state-of-the-art results on the SEED-VIG dataset and the SADT dataset. The accuracy of the drowsy class achieves 92% and 94% for 1-shot and 5-shot support samples in the SEED-VIG dataset, and 62% and 78% for 1-shot and 5-shot support samples in the SADT dataset.
translated by 谷歌翻译
U-Net and its extensions have achieved great success in medical image segmentation. However, due to the inherent local characteristics of ordinary convolution operations, U-Net encoder cannot effectively extract global context information. In addition, simple skip connections cannot capture salient features. In this work, we propose a fully convolutional segmentation network (CMU-Net) which incorporates hybrid convolutions and multi-scale attention gate. The ConvMixer module extracts global context information by mixing features at distant spatial locations. Moreover, the multi-scale attention gate emphasizes valuable features and achieves efficient skip connections. We evaluate the proposed method using both breast ultrasound datasets and a thyroid ultrasound image dataset; and CMU-Net achieves average Intersection over Union (IoU) values of 73.27% and 84.75%, and F1 scores of 84.81% and 91.71%. The code is available at https://github.com/FengheTan9/CMU-Net.
translated by 谷歌翻译
任务概括是自然语言处理(NLP)的漫长挑战。最近的研究试图通过将NLP任务映射到人类可读的提示形式中来提高预训练语言模型的任务概括能力。但是,这些方法需要费力且不灵活的提示,并且在同一下游任务上的不同提示可能会获得不稳定的性能。我们提出了统一的架构提示,这是一种灵活且可扩展的提示方法,该方法会根据任务输入架构自动自动自定义每个任务的可学习提示。它在任务之间建模共享知识,同时保持不同任务架构的特征,从而增强任务概括能力。架构提示采用每个任务的明确数据结构,以制定提示,因此涉及几乎没有人类的努力。为了测试模式提示的任务概括能力,我们对各种一般NLP任务进行基于模式提示的多任务预训练。该框架在从8种任务类型(例如QA,NLI等)的16个看不见的下游任务上实现了强劲的零射击和很少的概括性能。此外,全面的分析证明了每个组件在架构提示中的有效性,其在任务组成性方面的灵活性以及在全DATA微调设置下提高性能的能力。
translated by 谷歌翻译
随着对深度学习民主化的向往,在资源约束设备上实施基于变压器的自然语言处理(NLP)模型的需求越来越大,以实施低延迟和高准确性。现有的BERT修剪方法要求域专家启发手工制作超参数,以在模型大小,延迟和准确性之间取得平衡。在这项工作中,我们提出了AE-Bert,这是一种具有有效评估的自动和高效的BERT修剪框架,以选择“良好”子网络候选(高精度),鉴于整体修剪比率的约束。我们提出的方法不需要人类专家的经验,并且可以在许多NLP任务上取得更好的准确性能。我们关于一般语言理解评估(胶水)基准的实验结果表明,AE-Bert优于Bert $ _ {\ Mathrm {base}} $的最先进的(SOTA)手工制作的修剪方法。在QNLI和RTE上,我们获得75 \%和42.8%的总体修剪比,同时获得更高的精度。在MRPC上,我们的得分比SOTA高4.6,在相同的整体修剪比为0.5。在STS-B上,与SOTA手工制作的修剪方法相比,我们可以达到40 \%的修剪比,而Spearman相关性的损失非常小。实验结果还表明,在模型压缩之后,单个bert $ _ {\ mathrm {base}} $ coder的推理时间在xilinx alveo u200 fpga板上具有1.83 $ \ times $ speedup,与intel(r)xeon相比)Gold 5218(2.30GHz)CPU,它显示了部署BERT $ _ {\ MATHRM {base}} $模型在计算限制设备上生成的方法生成的子网的合理性。
translated by 谷歌翻译
通过微调调整大型预训练模型(PTM)会施加过刺激的计算和存储负担。对参数有效调整(PET)的最新研究发现,与常规微调相比,仅优化以PTM为条件的一小部分参数才能产生PAR性能。通常,PET方法精确设计参数有效的模块(PET模块)可以应用于PTMS内部的任意细粒位置。但是,这些细粒度位置的有效性很大程度上依赖于复杂的手动指定,因此通常会产生次优的结果。与手动指定相反,我们以自动方式探索构建宠物模块。我们将自动\ textbf {s} earch \ textbf {s} parse \ textbf {s} \ textbf {p} arameter- \ textbf {e} fficbf {e} fficient \ textbf {t textbf {t} uning(s $^3 $ pet) 。基于各种PET方法的统一框架,S $^3 $ PET通过双层优化进行了可区分的PET结构搜索,并提出了移动的全局Sigmoid方法,以明确控制可训练的参数的数量。广泛的实验表明,S $^3 $ PET超过了具有较低训练参数的手册和随机结构。搜索结构可保留99 \%的微调性能,具有0.01 \%可训练的参数。此外,S $^3 $ PET的优势通过极低的训练参数预算(0.0009 \%$ \ sim $ 0.01 \%)进行扩增。搜索结构是可转移和解释的,为PET方法的未来设计提供了建议和指导。
translated by 谷歌翻译
Question Answering (QA) is a longstanding challenge in natural language processing. Existing QA works mostly focus on specific question types, knowledge domains, or reasoning skills. The specialty in QA research hinders systems from modeling commonalities between tasks and generalization for wider applications. To address this issue, we present ProQA, a unified QA paradigm that solves various tasks through a single model. ProQA takes a unified structural prompt as the bridge and improves the QA-centric ability by structural prompt-based pre-training. Through a structurally designed prompt-based input schema, ProQA concurrently models the knowledge generalization for all QA tasks while keeping the knowledge customization for every specific QA task. Furthermore, ProQA is pre-trained with structural prompt-formatted large-scale synthesized corpus, which empowers the model with the commonly-required QA ability. Experimental results on 11 QA benchmarks demonstrate that ProQA consistently boosts performance on both full data fine-tuning, few-shot learning, and zero-shot testing scenarios. Furthermore, ProQA exhibits strong ability in both continual learning and transfer learning by taking the advantages of the structural prompt.
translated by 谷歌翻译
基于深度学习的计算机辅助诊断在乳腺癌检测中取得了前所未有的性能。然而,大多数方法都是计算密集型的,这阻碍了他们在现实世界应用中的更广泛传播。在这项工作中,我们提出了一种高效和轻量加权的多任务学习架构,同时分类和分段乳腺肿瘤。我们将分段任务纳入肿瘤分类网络,使骨干网络学习侧重于肿瘤区域的陈述。此外,我们提出了一种新的数值稳定的损失功能,可容易地控制癌症检测的敏感性和特异性之间的平衡。使用具有1,511个图像的乳房超声数据集来评估所提出的方法。肿瘤分类的准确性,敏感性和特异性分别为88.6%,94.1%和85.3%。我们使用虚拟移动设备验证模型,每个图像的平均推断时间为0.35秒。
translated by 谷歌翻译
快速学习已成为现代自然语言处理的新范式,它直接适应培训的语言模型(PLMS)到$ CLOZE $ -Style预测,自回归建模或序列到序列生成,从而导致各种任务的表现。但是,尚未提出及时学习的标准实施框架,以及大多数现有的及时学习码条,通常是不受管制的,仅为特定方案提供有限的实现。由于有许多细节,例如模板策略,初始化策略和语言化策略等,因此需要在快速学习中考虑,从业者面临障碍,以便快速调整所需的迅速学习方法到他们的应用程序。在本文中,我们展示了{OpenPrompt},一个统一的易于使用的工具包,可以通过PLMS快速学习。 OpenPrompt是一项研究型框架,配备了效率,模块化和可扩展性,其组合性允许自由地将不同的PLMS,任务格式和提示模块组合在统一的范例中。用户可以宽松地部署快速学习框架,并在没有约束的情况下在不同的NLP任务上评估它们的泛化。 OpenPrompt在{\ url {https://github.com/thunlp/openprompt}}上公开发布。
translated by 谷歌翻译